Game of Thrones Ratings Per Episode Per Season
⚠ Warning: This Website Contains Spoilers
“How have the ratings of Game of Thrones episodes evolved over time across different seasons?”
This research question will aim to explore trends in the data such as whether ratings increased or decreased over the course of the show, or whether there were any significant drops after key plot developments, for example.
The raw data was obtained from the IMDB website which is publicly accessible via this link:
https://www.imdb.com/title/tt0944947/ratings/
The data used in this analysis was extracted from the ‘Ratings by episode’ section of the Game of Thrones page.
I created an excel spreadsheet which perfectly replicated the grid shown on the IMDB page showing each episode rating, making the data accessible for wwrangling and visualisation.
| Library and Version | Purpose |
|---|---|
| tidyverse_2.0.0 | for handling data |
| here_1.0.1 | for easy file and directory referencing |
| readxl_1.4.3 | for reading excel files |
| knitr_1.49 | for combining R code with text to create dynamic reports |
| dplyr_1.1.4 | for data manipulation tasks |
| jpeg_0.10.10 | for reading jpeg files |
| gganimate_1.0.9 | for creating animated plots |
| plotly_4.10.4 | for an interactive plot |
| showtext_0.9.7 | for changing fonts |
## New names:
## • `` -> `...1`
R has automatically assigned any empty values in the table to now say “…1”. First, I am going to print the data to see how it looks initially after importing.
## # A tibble: 8 × 12
## ...1 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 s1 8.9 8.6 8.5 8.6 9 9.1 9.1 8.9 9.6 9.4 9.2
## 2 s2 8.6 8.3 8.7 8.6 8.6 8.9 8.8 8.6 9.7 9.3 NA
## 3 s3 8.6 8.4 8.7 9.5 8.9 8.7 8.6 8.9 9.9 9.1 NA
## 4 s4 9 9.7 8.7 8.7 8.6 9.7 9 9.7 9.6 9.7 NA
## 5 s5 8.3 8.3 8.3 8.5 8.5 7.9 8.8 9.8 9.4 9.1 NA
## 6 s6 8.4 9.2 8.6 9 9.7 8.3 8.5 8.3 9.9 9.9 NA
## 7 s7 8.5 8.8 9.1 9.7 8.7 9 9.4 NA NA NA NA
## 8 s8 7.6 7.9 7.5 5.5 5.9 4 NA NA NA NA NA
The data has successfully imported, the next step is to wrangle the data to convert it into a form that can be easily visualised.
#rename the first column after automatic assignment of "...1"
colnames(rawdata) <- ifelse(colnames(rawdata) == "...1", "Season", colnames(rawdata))
#rename the columns, excluding the first which I have just renamed to remain empty
colnames(rawdata)[-1] <- c("Episode 1", "Episode 2", "Episode 3", "Episode 4", "Episode 5", "Episode 6", "Episode 7", "Episode 8", "Episode 9", "Episode 10", "Episode 11")
#this is a sanity to check to make sure the column headers changed
head(rawdata, n = 1)## # A tibble: 1 × 12
## Season `Episode 1` `Episode 2` `Episode 3` `Episode 4` `Episode 5` `Episode 6`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 s1 8.9 8.6 8.5 8.6 9 9.1
## # ℹ 5 more variables: `Episode 7` <dbl>, `Episode 8` <dbl>, `Episode 9` <dbl>,
## # `Episode 10` <dbl>, `Episode 11` <dbl>
#change the values in the first column
#removing the 's' just to clean up to view of the table
rawdata$Season <- sub("s", "", rawdata$Season)
#render the table with kable
kable(rawdata, format = "markdown")| Season | Episode 1 | Episode 2 | Episode 3 | Episode 4 | Episode 5 | Episode 6 | Episode 7 | Episode 8 | Episode 9 | Episode 10 | Episode 11 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 8.9 | 8.6 | 8.5 | 8.6 | 9.0 | 9.1 | 9.1 | 8.9 | 9.6 | 9.4 | 9.2 |
| 2 | 8.6 | 8.3 | 8.7 | 8.6 | 8.6 | 8.9 | 8.8 | 8.6 | 9.7 | 9.3 | NA |
| 3 | 8.6 | 8.4 | 8.7 | 9.5 | 8.9 | 8.7 | 8.6 | 8.9 | 9.9 | 9.1 | NA |
| 4 | 9.0 | 9.7 | 8.7 | 8.7 | 8.6 | 9.7 | 9.0 | 9.7 | 9.6 | 9.7 | NA |
| 5 | 8.3 | 8.3 | 8.3 | 8.5 | 8.5 | 7.9 | 8.8 | 9.8 | 9.4 | 9.1 | NA |
| 6 | 8.4 | 9.2 | 8.6 | 9.0 | 9.7 | 8.3 | 8.5 | 8.3 | 9.9 | 9.9 | NA |
| 7 | 8.5 | 8.8 | 9.1 | 9.7 | 8.7 | 9.0 | 9.4 | NA | NA | NA | NA |
| 8 | 7.6 | 7.9 | 7.5 | 5.5 | 5.9 | 4.0 | NA | NA | NA | NA | NA |
The above table shows a much cleaner version of the data, however, it is not ready for visualisation yet. Before I take the data and plot it, first I am going to remove the final column containing ‘Episode 11’. The reason for this is that this data is not required in the analysis as this is the unaired original pilot. Audiences never saw this episode and was simply an alternate to the official pilot episode that was released. Therefore, ‘Episode 11’ was excluded from the final dataset.
# Reshape the data to long format for a more flexible structure for visualizing, analyzing, and modeling data. This is easier for ggplot2 to handle.
rawdata_long <- rawdata %>%
pivot_longer(cols = starts_with("Episode"), # Select columns that start with "Episode"
names_to = "Episode", # Create a new column "Episode"
values_to = "Rating") # Create a new column "Rating"
# Exclude Episode 11 from the data
data <- rawdata_long %>%
filter(str_replace(Episode, "Episode ", "") != "11")
#this is a sanity check to make sure the data is now in a long format
head(data)## # A tibble: 6 × 3
## Season Episode Rating
## <chr> <chr> <dbl>
## 1 1 Episode 1 8.9
## 2 1 Episode 2 8.6
## 3 1 Episode 3 8.5
## 4 1 Episode 4 8.6
## 5 1 Episode 5 9
## 6 1 Episode 6 9.1
#Create a basic line plot with minimal customisation
p <- ggplot(data, aes(x = as.integer(str_replace(Episode, "Episode ", "")), # Convert episode to numeric
y = Rating,
color = factor(Season))) + # Use Season for different lines
geom_line() + # Draw lines
geom_point() + # Add points for each episode
labs(x = "Episode Number", # Label for x-axis
y = "Episode Rating", # Label for y-axis
color = "Season Number") + # Label for the legend
theme_minimal() + # Use a minimal theme for a clean look
scale_color_viridis_d() + # Add color scale for different lines
theme(legend.position = "right") # Place the legend at the right
#view the plot as a sanity check to assess what direction to take the customisations.
print(p)The x axis has automated to appear in intervals of 2.5. This needs recoding so that it shows the numeric episode numbers. To do this, I need to ‘mutate’ the data to remove “Episode” from the string of numbers. This converts the remaining number (e.g., “1”, “2”) into an integer; creating a new column called ‘EpisodeNumber’.
# Preprocess the Episode column
data1 <- data %>%
mutate(EpisodeNumber = as.integer(str_replace(Episode, "Episode ", "")))
# This is a sanity check to view the new column
print(data1)## # A tibble: 80 × 4
## Season Episode Rating EpisodeNumber
## <chr> <chr> <dbl> <int>
## 1 1 Episode 1 8.9 1
## 2 1 Episode 2 8.6 2
## 3 1 Episode 3 8.5 3
## 4 1 Episode 4 8.6 4
## 5 1 Episode 5 9 5
## 6 1 Episode 6 9.1 6
## 7 1 Episode 7 9.1 7
## 8 1 Episode 8 8.9 8
## 9 1 Episode 9 9.6 9
## 10 1 Episode 10 9.4 10
## # ℹ 70 more rows
Next, I wanted to add a more personal customisation to the colours on the visualisation. To do this, I assigned a family house sigil to each of the seasons based on major plot points:
| Season Number | House | Major Plot Point |
|---|---|---|
| Season 1 | Stark | Only time all the Starks are together & Death of Eddard Stark |
| Season 2 | Baratheon | Death of Renly Baratheon & Battle of Blackwater |
| Season 3 | Lannister | Jaime Lannister loses his hand & The Red Wedding |
| Season 4 | Martell | Death of Oberyn Martell |
| Season 5 | Tyrell | Margaery Tyrell manipulates King’s Landing |
| Season 6 | Arryn | Saved the day at the Battle of the Bastards |
| Season 7 | Greyjoy | Yara Greyjoy declares herself Queen of the Iron Islands |
| Season 8 | Targaryen | Daenerys Targaryen gets the Iron Throne |
# Convert the numeric 'Season' column to a factor with appropriate labels
data1$Season <- factor(data1$Season,
levels = 1:8,
labels = c("Season 1", "Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8"))
# Assign custom colors to each line based on the season
custom_colors <- c(
"Season 1" = "#7f7f7f", # Grey for Season 1, House Stark
"Season 2" = "#ffc406", # Yellow for Season 2, House Baratheon
"Season 3" = "#B03060", # Maroon for Season 3, House Lannister
"Season 4" = "#ED7014", # Orange for Season 4, House Martell
"Season 5" = "#006400", # Green for Season 5, House Tyrell
"Season 6" = "#023E8A", # Blue for Season 6, House Arryn
"Season 7" = "#000000", # Black for Season 7, House Greyjoy
"Season 8" = "#ff0000" # Red for Season 8, House Targaryen
)
Season 1 : #7f7f7f
Season 2 : #ffc406
Season 3 : #B03060
Season 4 : #ED7014
Season 5 : #006400
Season 6 : #023E8A
Season 7 : #000000
Season 8 : #ff0000
# Create the plot with new customisations
p1 <- ggplot(data1, aes(x = EpisodeNumber, y = Rating, color = factor(Season))) +
geom_line() +
geom_point() +
labs(x = "Episode Number",
y = "Episode Rating",
color = "", # Label for the legend
caption = "Source: IMDB.com") + # Add source text at the bottom
ggtitle("Game of Thrones Episode Ratings Per Season") + # Add a title
theme_minimal() + # Clean, minimal theme
scale_color_manual(values = custom_colors) + # Apply custom colors for lines
scale_x_continuous(breaks = seq(1, max(data1$EpisodeNumber), by = 1)) + # Set x-axis breaks
scale_y_continuous(
breaks = seq(4, 10, by = 0.5), # Set y-axis breaks
limits = c(4, 10), # Set y-axis limits
expand = c(0, 0) # Remove extra padding
) +
theme(legend.position = "right") +
guides(color = guide_legend(
keywidth = 2, # Adjust the size of the legend key (box around the color circle)
keyheight = 2, # Adjust the size of the legend key (box around the color circle)
override.aes = list(size = 5) # Increase the size of the color circles inside the legend
))
# Display the plot
print(p1)#load font
font_add_google("Merriweather", "Merriweather", regular.wt = 400)
showtext_auto()
p1_font <- p1 +
theme(
text = element_text(family = "Merriweather"),
plot.title = element_text(family = "Merriweather", face = "bold")
)
print(p1_font)## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).